诸如对象检测和分割等密集的计算机视觉任务需要有效的多尺度特征表示,用于检测或分类具有不同大小的对象或区域。虽然卷积神经网络(CNNS)是这种任务的主导架构,但最近引入了视觉变压器(VITS)的目标是将它们替换为骨干。类似于CNN,VITS构建一个简单的多级结构(即,细致粗略),用于使用单尺度补丁进行多尺度表示。在这项工作中,通过从现有变压器的不同角度来看,我们探索了多尺度补丁嵌入和多路径结构,构建了多路径视觉变压器(MPVIT)。 MPVIT通过使用重叠的卷积贴片嵌入,将相同尺寸〜(即,序列长度,序列长度,序列长度的序列长度)嵌入不同尺度的斑块。然后,通过多个路径独立地将不同尺度的令牌独立地馈送到变压器编码器,并且可以聚合产生的特征,使得能够在相同特征级别的精细和粗糙的特征表示。由于多样化,多尺寸特征表示,我们的MPVits从微小〜(5m)缩放到基础〜(73米)一直在想象成分,对象检测,实例分段上的最先进的视觉变压器来实现卓越的性能,和语义细分。这些广泛的结果表明,MPVIT可以作为各种视觉任务的多功能骨干网。代码将在\ url {https://git.io/mpvit}上公开可用。
translated by 谷歌翻译
As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.
translated by 谷歌翻译
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.
translated by 谷歌翻译
Partial differential equations (PDEs) are widely used for description of physical and engineering phenomena. Some key parameters involved in PDEs, which represents certain physical properties with important scientific interpretations, are difficult or even impossible to be measured directly. Estimation of these parameters from noisy and sparse experimental data of related physical quantities is an important task. Many methods for PDE parameter inference involve a large number of evaluations of numerical solution of PDE through algorithms such as finite element method, which can be time-consuming especially for nonlinear PDEs. In this paper, we propose a novel method for estimating unknown parameters in PDEs, called PDE-Informed Gaussian Process Inference (PIGPI). Through modeling the PDE solution as a Gaussian process (GP), we derive the manifold constraints induced by the (linear) PDE structure such that under the constraints, the GP satisfies the PDE. For nonlinear PDEs, we propose an augmentation method that transfers the nonlinear PDE into an equivalent PDE system linear in all derivatives that our PIGPI can handle. PIGPI can be applied to multi-dimensional PDE systems and PDE systems with unobserved components. The method completely bypasses the numerical solver for PDE, thus achieving drastic savings in computation time, especially for nonlinear PDEs. Moreover, the PIGPI method can give the uncertainty quantification for both the unknown parameters and the PDE solution. The proposed method is demonstrated by several application examples from different areas.
translated by 谷歌翻译
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.
translated by 谷歌翻译
Optimal Transport (OT) provides a useful geometric framework to estimate the permutation matrix under unsupervised cross-lingual word embedding (CLWE) models that pose the alignment task as a Wasserstein-Procrustes problem. However, linear programming algorithms and approximate OT solvers via Sinkhorn for computing the permutation matrix come with a significant computational burden since they scale cubically and quadratically, respectively, in the input size. This makes it slow and infeasible to compute OT distances exactly for a larger input size, resulting in a poor approximation quality of the permutation matrix and subsequently a less robust learned transfer function or mapper. This paper proposes an unsupervised projection-based CLWE model called quantized Wasserstein Procrustes (qWP). qWP relies on a quantization step of both the source and target monolingual embedding space to estimate the permutation matrix given a cheap sampling procedure. This approach substantially improves the approximation quality of empirical OT solvers given fixed computational cost. We demonstrate that qWP achieves state-of-the-art results on the Bilingual lexicon Induction (BLI) task.
translated by 谷歌翻译
We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a simple analytical model for inference efficiency to select the best multi-dimensional partitioning techniques optimized for TPU v4 slices based on the application requirements. We combine these with a suite of low-level optimizations to achieve a new Pareto frontier on the latency and model FLOPS utilization (MFU) tradeoffs on 500B+ parameter models that outperforms the FasterTransformer suite of benchmarks. We further show that with appropriate partitioning, the lower memory requirements of multiquery attention (i.e. multiple query heads share single key/value head) enables scaling up to 32x larger context lengths. Finally, we achieve a low-batch-size latency of 29ms per token during generation (using int8 weight quantization) and a 76% MFU during large-batch-size processing of input tokens, while supporting a long 2048-token context length on the PaLM 540B parameter model.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
In this study, to address the current high earlydetection miss rate of colorectal cancer (CRC) polyps, we explore the potentials of utilizing transfer learning and machine learning (ML) classifiers to precisely and sensitively classify the type of CRC polyps. Instead of using the common colonoscopic images, we applied three different ML algorithms on the 3D textural image outputs of a unique vision-based surface tactile sensor (VS-TS). To collect realistic textural images of CRC polyps for training the utilized ML classifiers and evaluating their performance, we first designed and additively manufactured 48 types of realistic polyp phantoms with different hardness, type, and textures. Next, the performance of the used three ML algorithms in classifying the type of fabricated polyps was quantitatively evaluated using various statistical metrics.
translated by 谷歌翻译
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
translated by 谷歌翻译